Load packages, utility functions, global variables
source("~scripts/00 - Admin.R")
source("~scripts/01 - Utility Functions.R")
Resources
Vignette: https://cran.r-project.org/web/packages/osmdata/vignettes/osmdata.html
Map features: https://wiki.openstreetmap.org/wiki/Map_Features
?available_features() and ?available_tags([feature])
List of parameters:
Balt_bbox <- getbb("Baltimore") %>%
opq()
Query the bus stops in each city.
Tags used:
public_transport=*source("~scripts/11 - Read Transit SDG data.R")
# Balt_busStops <- Balt_bbox %>%
# add_osm_feature(key = "highway", value = "bus_stop") %>%
# osmdata_sf() %>%
# # keep only the points. Note that the query returned 4333 points, 9 polygons, and 1 multi-line feature
# .$osm_points
Make quick maps using only the point data
# source("~scripts/30 - Read basemaps.R")
sdg_basemaps <- readRDS("~objects/30/30_sdg_basemaps.rds")
ggmap(Balt_map) +
geom_sf(data = Balt_busStops,
inherit.aes = FALSE, # this is necessary
alpha = 0.5) +
labs(title = "Bus Stops in Baltimore",
caption = "Data source: OSM",
x = "",
y = "")
base_url <- "https://api.ohsome.org/v0.9"
api_metadata <- GET(url= paste(base_url, "/metadata", sep = "")) %>%
content("text") %>%
fromJSON()
The below tells us the database contains data from October 8, 2007 to May 23, 2020.
api_metadata$extractRegion$temporalExtent
## $fromTimestamp
## [1] "2007-10-08T00:00:00Z"
##
## $toTimestamp
## [1] "2020-06-17T03:00:00Z"
Use this endpoint to aggregate OSM data, e.g. counts, areas, lengths, and users (contributors).
Let’s look at the trends for mapping bus stops in Baltimore.
# api_bbox <- getbb("Baltimore") %>% bbox_to_string() # note that the order of the coords is flipped from what the database needs
balt_bbox <- "-76.770759, 39.1308816,-76.450759,39.4508816"
# api_bbox <- "8.6581,49.3836,8.7225,49.4363"
api_keys <- "highway"
api_values <- "bus_stop"
monthly <- "2007-11-01/2020-05-23/P1M"
api_data <- GET(url = paste(base_url, "/elements/count", sep = ""),
query = list(
bboxes = balt_bbox,
keys = api_keys,
values = api_values,
time = monthly))
busStops_hist <- content(api_data, as = "text") %>%
fromJSON() %>%
.$result
The query from osmdata showed 4333 bus stops in Baltimore currently. The OSHDB query shows 4241 bus stops as of May 1, 2020.
ggplot(busStops_hist,
aes(x = as.Date(timestamp),
y = value)) +
geom_line() +
theme_bw() +
labs(title = "Bus Stops in Baltimore Mapped In OSM Over Time",
caption = "Source: OSHDB, ohsome API",
x = "Date",
y = "Count")
Use this endpoint to pull historical snapshots of OSM features.
The column names for the resulting dataframe seem odd - is there a better way to convert the API geoJSON response into sf?
# is there a more elegant way to do the below?
extraction_api_data <- GET(url= paste(base_url, "/elements/geometry", sep = ""),
query = list(
bboxes = balt_bbox,
keys = api_keys,
values = api_values,
types = "point", # do I want to limit it to points only?
time = monthly))
busStops_geom_hist <- read_sf(extraction_api_data)
busStops_geom_hist$X.snapshotTimestamp <- as.Date(busStops_geom_hist$X.snapshotTimestamp)
busStops_geom_hist
## Simple feature collection with 84352 features and 3 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -76.77029 ymin: 39.13266 xmax: -76.45081 ymax: 39.44865
## geographic CRS: WGS 84
## # A tibble: 84,352 x 4
## X.osmId X.snapshotTimestamp highway geometry
## <chr> <date> <chr> <POINT [°]>
## 1 node/1806310072 2013-04-01 bus_stop (-76.46212 39.3776)
## 2 node/1806310072 2013-05-01 bus_stop (-76.46212 39.3776)
## 3 node/1806310072 2013-06-01 bus_stop (-76.46212 39.3776)
## 4 node/1806310072 2013-07-01 bus_stop (-76.46212 39.3776)
## 5 node/1806310072 2013-08-01 bus_stop (-76.46212 39.3776)
## 6 node/1806310072 2013-09-01 bus_stop (-76.46212 39.3776)
## 7 node/1806310072 2013-10-01 bus_stop (-76.46212 39.3776)
## 8 node/1806310072 2013-11-01 bus_stop (-76.46212 39.3776)
## 9 node/1806310072 2013-12-01 bus_stop (-76.46212 39.3776)
## 10 node/1806310072 2014-01-01 bus_stop (-76.46212 39.3776)
## # ... with 84,342 more rows
# busStops_geom_hist <- content(extraction_api_data, as = "text") %>%
# fromJSON()
# write(busStops_geom_hist %>% toJSON(), "test.json")
According to our aggregated data, there was a massive spike in bus stops in Baltimore on OSM at the beginning of February, 2019: from 280 to 3968.
First, does the geometry data have the same number of observations as the aggregated data? It’s very close - it may be a matter of the time of day queried.
busStops_changeMap <- busStops_geom_hist %>%
filter(X.snapshotTimestamp %in% as.Date(c("2019-02-01", "2019-01-01")))
busStops_changeMap %>% group_by(X.snapshotTimestamp) %>%
summarize(count = n())
## Simple feature collection with 2 features and 2 fields
## geometry type: MULTIPOINT
## dimension: XY
## bbox: xmin: -76.77029 ymin: 39.13266 xmax: -76.45081 ymax: 39.44865
## geographic CRS: WGS 84
## # A tibble: 2 x 3
## X.snapshotTimesta~ count geometry
## <date> <int> <MULTIPOINT [°]>
## 1 2019-01-01 272 ((-76.7639 39.18362), (-76.76289 39.31459), (-76.761~
## 2 2019-02-01 3960 ((-76.77029 39.41797), (-76.77025 39.41795), (-76.77~
Next, compare to two months side-by-side on a map.
ggmap(sdg_basemaps$Baltimore) +
geom_sf(data = busStops_changeMap,
inherit.aes = FALSE,
alpha = 0.5) +
labs(title = "Change in Bus Stops Mapped on OSM in Baltimore",
subtitle = "Number of bus stops jumped from 280 in January 2019 to 3968 in February 2019.",
caption = "Data source: OSHDB",
x = "",
y = "") +
facet_wrap(~ X.snapshotTimestamp)